{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "bff31dbbccd2e580",
   "metadata": {},
   "source": [
    "# 任务一 ： 关联规则挖掘\n",
    "## 1.1 数据离散化"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "initial_id",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-10-22T08:49:58.508207Z",
     "start_time": "2024-10-22T08:49:58.254400Z"
    },
    "collapsed": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   Temperature (C) temperature_discrete  Humidity humidity_discrete  \\\n",
      "0         0.577778                   低温      0.89               高湿度   \n",
      "1         1.161111                   低温      0.85               高湿度   \n",
      "2         1.666667                   低温      0.82               高湿度   \n",
      "3         1.711111                   低温      0.82               高湿度   \n",
      "4         1.183333                   低温      0.86               高湿度   \n",
      "\n",
      "   Wind Speed (km/h) windspeed_discrete  \n",
      "0            17.1143                中速风  \n",
      "1            16.6152                中速风  \n",
      "2            20.2538                高速风  \n",
      "3            14.4900                中速风  \n",
      "4            13.9426                中速风  \n",
      "Formatted Date                object\n",
      "Summary                       object\n",
      "Precip Type                   object\n",
      "Temperature (C)              float64\n",
      "Apparent Temperature (C)     float64\n",
      "Humidity                     float64\n",
      "Wind Speed (km/h)            float64\n",
      "Wind Bearing (degrees)       float64\n",
      "Visibility (km)              float64\n",
      "Pressure (millibars)         float64\n",
      "Daily Summary                 object\n",
      "Hour                           int64\n",
      "TimeOfDay                     object\n",
      "Month                          int64\n",
      "Season                        object\n",
      "WindSpeedGroup                object\n",
      "WindDirection                 object\n",
      "temperature_discrete        category\n",
      "humidity_discrete           category\n",
      "windspeed_discrete          category\n",
      "dtype: object\n"
     ]
    }
   ],
   "source": [
    "\"\"\"我们将连续变量如温度和湿度等进行离散化。可以使用pd.cut方法将连续值分成多个类别区间\"\"\"\n",
    "import pandas as pd\n",
    "\n",
    "# 读取数据\n",
    "data = pd.read_csv('fyx.csv')\n",
    "\n",
    "# 温度离散化\n",
    "data['temperature_discrete'] = pd.cut(data['Temperature (C)'], bins=[-10, 0, 10, 20, 30], labels=[\"非常低温\", \"低温\", \"中温\", \"高温\"])\n",
    "\n",
    "# 湿度离散化\n",
    "data['humidity_discrete'] = pd.cut(data['Humidity'], bins=[0, 0.5, 0.75, 1], labels=[\"低湿度\", \"中湿度\", \"高湿度\"])\n",
    "\n",
    "# 风速离散化\n",
    "data['windspeed_discrete'] = pd.cut(data['Wind Speed (km/h)'], bins=[0, 10, 20, 30], labels=[\"低速风\", \"中速风\", \"高速风\"])\n",
    "\n",
    "# 查看离散化结果\n",
    "print(data[['Temperature (C)', 'temperature_discrete', 'Humidity', 'humidity_discrete', 'Wind Speed (km/h)', 'windspeed_discrete']].head())\n",
    "data.to_csv('离散化数据.csv', index=False)\n",
    "print(data.dtypes)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "874985dbf390de14",
   "metadata": {},
   "source": [
    "## 1.2 构建事务集\n",
    "将每条记录转换为事务，包含离散化后的特征值。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "5dbddabc3e2386",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-10-22T08:50:01.416644Z",
     "start_time": "2024-10-22T08:50:01.311532Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['低温', '高湿度', '中速风', 'Partly Cloudy', '东南', '冬季', '凌晨', 'Mostly cloudy throughout the day.']\n",
      "['低温', '高湿度', '中速风', 'Mostly Cloudy', '东南', '冬季', '凌晨', 'Mostly cloudy throughout the day.']\n",
      "['低温', '高湿度', '高速风', 'Mostly Cloudy', '东南', '冬季', '凌晨', 'Mostly cloudy throughout the day.']\n",
      "['低温', '高湿度', '中速风', 'Overcast', '东南', '冬季', '凌晨', 'Mostly cloudy throughout the day.']\n",
      "['低温', '高湿度', '中速风', 'Mostly Cloudy', '东', '冬季', '凌晨', 'Mostly cloudy throughout the day.']\n"
     ]
    }
   ],
   "source": [
    "# 选择相关的离散化特征\n",
    "transactions = data[['temperature_discrete', 'humidity_discrete', 'windspeed_discrete', 'Summary', 'WindDirection','Season','TimeOfDay','Daily Summary']].astype(str)\n",
    "\n",
    "# 将每行记录转换为事务\n",
    "transactions_list = transactions.values.tolist()\n",
    "# 打印前5条记录\n",
    "for i in range(5):\n",
    "    print(transactions_list[i])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8b12abc38a06d97e",
   "metadata": {},
   "source": [
    "## 1.3 使用Apriori算法进行关联规则挖掘\n",
    "使用mlxtend库中的Apriori算法挖掘频繁项集。\n",
    "各列的解释\n",
    "* antecedents（前件）：规则的左侧部分，表示条件。\n",
    "* consequents（后件）：规则的右侧部分，表示结果。\n",
    "* antecedent support（前件支持度）：在数据集中，前件出现的频率。\n",
    "* consequent support（后件支持度）：后件出现的频率。\n",
    "* support（支持度）：同时满足前件和后件的记录频率。\n",
    "* confidence（置信度）：给定前件的情况下，后件发生的概率。计算公式为 support(antecedents ∩ consequents) / support(antecedents)。\n",
    "* lift（提升度）：后件在给定前件情况下发生的可能性相对于其独立发生的可能性。计算公式为 confidence / support(consequents)。值大于1表示前件对后件有正向影响。\n",
    "* leverage（杠杆值）：反映前件和后件共同发生的超出随机发生的程度。计算公式为 support(antecedents ∩ consequents) - (support(antecedents) * support(consequents))。值越大表示相关性越强。\n",
    "* conviction：衡量前件发生时后件不发生的可能性。计算公式为 (1 - support(consequents)) / (1 - confidence)。值越大表示前件对后件的影响越强。\n",
    "* zhangs_metric：Zhang的指标，用于评估规则的重要性，通常在某些特定的研究中使用。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "7079fc1a023f4177",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-10-22T08:51:17.638667Z",
     "start_time": "2024-10-22T08:51:17.553659Z"
    }
   },
   "outputs": [],
   "source": [
    "from mlxtend.preprocessing import TransactionEncoder\n",
    "from mlxtend.frequent_patterns import apriori, association_rules\n",
    "\n",
    "# 转换为适合Apriori算法的格式\n",
    "te = TransactionEncoder() # 定义模型\n",
    "transactions_encoded = te.fit(transactions_list).transform(transactions_list) # 转换数据\n",
    "df_transactions = pd.DataFrame(transactions_encoded, columns=te.columns_) # 转换为DataFrame\n",
    "\n",
    "# 设定最小支持度和置信度阈值并挖掘频繁项集\n",
    "frequent_itemsets = apriori(df_transactions, min_support=0.05, use_colnames=True)\n",
    "\n",
    "# 生成关联规则\n",
    "rules = association_rules(frequent_itemsets, metric=\"confidence\", min_threshold=0.6)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "337adf8e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "        antecedents consequents  antecedent support  consequent support  \\\n",
      "2           (Foggy)       (高湿度)            0.074127            0.552344   \n",
      "9            (非常低温)        (冬季)            0.105964            0.247146   \n",
      "11             (凌晨)       (高湿度)            0.249987            0.552344   \n",
      "14           (非常低温)       (高湿度)            0.105964            0.552344   \n",
      "15     (Foggy, 低速风)       (高湿度)            0.051986            0.552344   \n",
      "25   (低温, Overcast)       (高湿度)            0.083274            0.552344   \n",
      "26  (低速风, Overcast)       (高湿度)            0.068185            0.552344   \n",
      "27   (冬季, Overcast)       (高湿度)            0.069564            0.552344   \n",
      "34         (下午, 高温)       (低湿度)            0.076274            0.161186   \n",
      "43         (凌晨, 夏季)        (中温)            0.062948            0.345156   \n",
      "46        (高湿度, 夏季)        (中温)            0.096724            0.345156   \n",
      "47        (中湿度, 高温)        (夏季)            0.075911            0.251843   \n",
      "53        (低速风, 低温)       (高湿度)            0.146522            0.552344   \n",
      "55         (低温, 凌晨)       (高湿度)            0.102096            0.552344   \n",
      "58         (低温, 秋季)       (高湿度)            0.099804            0.552344   \n",
      "59      (低速风, 非常低温)        (冬季)            0.061040            0.247146   \n",
      "60        (低速风, 冬季)       (高湿度)            0.111325            0.552344   \n",
      "61        (低速风, 凌晨)       (高湿度)            0.141648            0.552344   \n",
      "67      (低速风, 非常低温)       (高湿度)            0.061040            0.552344   \n",
      "69         (冬季, 凌晨)       (高湿度)            0.061786            0.552344   \n",
      "70         (冬季, 夜晚)       (高湿度)            0.061786            0.552344   \n",
      "71         (冬季, 早晨)       (高湿度)            0.061786            0.552344   \n",
      "72       (冬季, 非常低温)       (高湿度)            0.088521            0.552344   \n",
      "73      (高湿度, 非常低温)        (冬季)            0.090305            0.247146   \n",
      "74           (非常低温)   (冬季, 高湿度)            0.105964            0.199152   \n",
      "76         (秋季, 凌晨)       (高湿度)            0.062284            0.552344   \n",
      "77    (低速风, 中温, 凌晨)       (高湿度)            0.062346            0.552344   \n",
      "80   (低速风, 高湿度, 夏季)        (中温)            0.065717            0.345156   \n",
      "\n",
      "     support  confidence      lift  leverage  conviction  zhangs_metric  \n",
      "2   0.073826    0.995943  1.803120  0.032883  110.339434       0.481066  \n",
      "9   0.088521    0.835389  3.380148  0.062333    4.573524       0.787614  \n",
      "11  0.210103    0.840455  1.521614  0.072024    2.805822       0.457063  \n",
      "14  0.090305    0.852222  1.542918  0.031776    3.029236       0.393583  \n",
      "15  0.051810    0.996609  1.804326  0.023096  132.005792       0.470221  \n",
      "25  0.071576    0.859527  1.556143  0.025580    3.186769       0.389850  \n",
      "26  0.058665    0.860380  1.557689  0.021003    3.206249       0.384221  \n",
      "27  0.061434    0.883125  1.598866  0.023010    3.830198       0.402561  \n",
      "34  0.054641    0.716383  4.444453  0.042347    2.957563       0.838994  \n",
      "43  0.052515    0.834267  2.417075  0.030789    3.951199       0.625661  \n",
      "46  0.077612    0.802402  2.324754  0.044227    3.314018       0.630867  \n",
      "47  0.055274    0.728142  2.891250  0.036156    2.752013       0.707863  \n",
      "53  0.122919    0.838913  1.518823  0.041989    2.778967       0.400239  \n",
      "55  0.088770    0.869477  1.574158  0.032378    3.429705       0.406212  \n",
      "58  0.084653    0.848192  1.535622  0.029527    2.948829       0.387469  \n",
      "59  0.050296    0.823989  3.334023  0.035210    4.277318       0.745572  \n",
      "60  0.099462    0.893433  1.617529  0.037972    4.200686       0.429598  \n",
      "61  0.127752    0.901896  1.632852  0.049513    4.563082       0.451533  \n",
      "67  0.056197    0.920659  1.666821  0.022482    5.642188       0.426062  \n",
      "69  0.056975    0.922122  1.669469  0.022847    5.748132       0.427416  \n",
      "70  0.053013    0.858006  1.553390  0.018886    3.152640       0.379707  \n",
      "71  0.051706    0.836858  1.515102  0.017579    2.743964       0.362368  \n",
      "72  0.075890    0.857310  1.552130  0.026996    3.137266       0.390271  \n",
      "73  0.075890    0.840377  3.400331  0.053572    4.716443       0.775986  \n",
      "74  0.075890    0.716187  3.596189  0.054787    2.821748       0.807493  \n",
      "76  0.055336    0.888445  1.608499  0.020934    4.012867       0.403429  \n",
      "77  0.054818    0.879242  1.591836  0.020381    3.707034       0.396516  \n",
      "80  0.052391    0.797223  2.309749  0.029708    3.229377       0.606939  \n"
     ]
    }
   ],
   "source": [
    "filtered_rules = rules[\n",
    "    (rules['support'] >= 0.05) & \n",
    "    (rules['confidence'] >= 0.7) & \n",
    "    (rules['lift'] >= 1.5)\n",
    "]\n",
    "\n",
    "# 输出关联规则\n",
    "print(filtered_rules)\n",
    "filtered_rules.to_csv('关联规则.csv', index=False)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8ae5dff0",
   "metadata": {},
   "source": [
    "在上述数据中，我们看到一系列的关联规则，这些规则描述了不同气象条件（如温度、湿度、风速等）之间的关系。通过分析这些规则的度量值，我们可以评估它们的有用性和可靠性。\n",
    "\n",
    "### 关联规则的基本概念\n",
    "\n",
    "- **Antecedent（前件）**：规则的左侧，即如果部分。\n",
    "- **Consequent（后件）**：规则的右侧，即则部分。\n",
    "- **Support（支持度）**：同时包含前件和后件的数据项集出现的频率。\n",
    "- **Confidence（置信度）**：给定前件情况下，后件出现的概率。\n",
    "- **Lift（提升度）**：衡量规则的有效性，表示前件与后件之间关联性的强度。\n",
    "- **Leverage（杠杆率）**：表示前件和后件一起发生的概率与期望概率的差异。\n",
    "- **Conviction（确信度）**：衡量当规则为假时的意外程度。\n",
    "- **Zhang's Metric（张氏度量）**：综合考虑了规则的准确性和覆盖率。\n",
    "\n",
    "### 规则分析\n",
    "\n",
    "1. **规则：\"Foggy\" -> \"高湿度\"**\n",
    "   - **Confidence (置信度)**: 0.9959，非常高，意味着几乎可以确定当天气“雾”时，湿度会很高。\n",
    "   - **Lift (提升度)**: 1.8031，表明“雾”和“高湿度”之间存在较强的正相关关系。\n",
    "   - **有用性**: 对于农业、交通等行业来说，这个规则非常有用，可以帮助预测恶劣天气，采取预防措施。\n",
    "\n",
    "2. **规则：\"非常低温\" -> \"冬季\"**\n",
    "   - **Confidence (置信度)**: 0.8354，较高，说明当气温非常低时，很可能是冬季。\n",
    "   - **Lift (提升度)**: 3.3801，表示“非常低温”与“冬季”之间有很强的联系。\n",
    "   - **有用性**: 对于季节性活动规划或能源需求预测等领域，此规则具有实际应用价值。\n",
    "\n",
    "3. **规则：\"凌晨\" -> \"高湿度\"**\n",
    "   - **Confidence (置信度)**: 0.8405，高，表示凌晨时分湿度较高的可能性很大。\n",
    "   - **Lift (提升度)**: 1.5216，显示“凌晨”与“高湿度”之间有一定的相关性。\n",
    "   - **有用性**: 对于气象预报或户外活动安排，这条规则可以提供参考信息。\n",
    "\n",
    "4. **规则：\"非常低温\" & \"高湿度\" -> \"冬季\"**\n",
    "   - **Confidence (置信度)**: 0.8404，较高，意味着当同时出现“非常低温”和“高湿度”时，几乎可以断定是冬季。\n",
    "   - **Lift (提升度)**: 3.4003，非常高的提升度反映了这两者组合对冬季的强烈指示作用。\n",
    "   - **有用性**: 这个规则对于季节性气候变化的研究特别有价值。\n",
    "\n",
    "### 总结\n",
    "\n",
    "从以上分析可以看出，这些关联规则提供了关于特定气象条件下其他气象条件发生的概率的信息。对于需要根据天气状况做出决策的领域，比如农业灌溉、交通管理、旅游规划等，这些规则都是非常有用的工具。然而，需要注意的是，虽然这些规则在统计上有效，但在实际应用时还需要结合其他因素进行综合判断。例如，地理位置、季节变化等因素都可能影响规则的应用效果。因此，在利用这些规则时，应该考虑到这些外部因素的影响。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f31dfb5",
   "metadata": {},
   "source": [
    "# 任务二 ： 异常检测\n",
    "## 2.1 特征选择与数据标准化处理\n",
    "选择相关的特征，如温度、湿度、风速，并进行标准化处理。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "62ef327b",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.preprocessing import StandardScaler\n",
    "\n",
    "# 选择用于检测异常的特征\n",
    "features = data[['Temperature (C)', 'Humidity', 'Wind Speed (km/h)','Apparent Temperature (C)','Visibility (km)','Pressure (millibars)']]\n",
    "\n",
    "# 标准化处理\n",
    "scaler = StandardScaler()\n",
    "scaled_features = scaler.fit_transform(features)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b8a39a2c",
   "metadata": {},
   "source": [
    "## 2.2 使用孤立森林算法进行异常检测"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "70aea42d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Temperature (C) 维度异常数据：\n",
      "                  Formatted Date        Summary Precip Type  Temperature (C)  \\\n",
      "196    2006-01-09 04:00:00+00:00          Clear        snow        -5.094444   \n",
      "197    2006-01-09 05:00:00+00:00          Clear        snow        -5.138889   \n",
      "198    2006-01-09 06:00:00+00:00          Clear        snow        -6.205556   \n",
      "199    2006-01-09 07:00:00+00:00  Partly Cloudy        snow        -6.044444   \n",
      "200    2006-01-09 08:00:00+00:00  Partly Cloudy        snow        -5.138889   \n",
      "...                          ...            ...         ...              ...   \n",
      "93756  2016-09-11 15:00:00+00:00  Mostly Cloudy        rain        32.000000   \n",
      "93757  2016-09-11 16:00:00+00:00  Mostly Cloudy        rain        30.427778   \n",
      "93777  2016-09-12 12:00:00+00:00  Partly Cloudy        rain        31.038889   \n",
      "93778  2016-09-12 13:00:00+00:00  Partly Cloudy        rain        31.766667   \n",
      "93779  2016-09-12 14:00:00+00:00  Partly Cloudy        rain        30.711111   \n",
      "\n",
      "       Apparent Temperature (C)  Humidity  Wind Speed (km/h)  \\\n",
      "196                   -5.094444      0.96             3.4132   \n",
      "197                   -5.138889      0.92             4.6207   \n",
      "198                  -10.388889      0.92             9.0643   \n",
      "199                   -6.044444      0.92             3.3649   \n",
      "200                   -8.766667      0.92             8.0500   \n",
      "...                         ...       ...                ...   \n",
      "93756                 31.283333      0.34            12.6546   \n",
      "93757                 29.577778      0.35            10.6904   \n",
      "93777                 30.038889      0.33             7.1162   \n",
      "93778                 30.522222      0.30             3.4776   \n",
      "93779                 29.866667      0.34             9.1609   \n",
      "\n",
      "       Wind Bearing (degrees)  Visibility (km)  Pressure (millibars)  ...  \\\n",
      "196                     348.0           6.4883               1035.65  ...   \n",
      "197                     338.0           6.1180               1035.28  ...   \n",
      "198                     339.0           7.9695               1035.28  ...   \n",
      "199                       9.0           6.6976               1035.63  ...   \n",
      "200                     350.0           6.1985               1035.76  ...   \n",
      "...                       ...              ...                   ...  ...   \n",
      "93756                    10.0           9.9820               1016.43  ...   \n",
      "93757                   352.0          10.5777               1016.38  ...   \n",
      "93777                   209.0          16.1000               1018.54  ...   \n",
      "93778                   123.0          15.5526               1018.09  ...   \n",
      "93779                     2.0          16.1000               1017.46  ...   \n",
      "\n",
      "      humidity_discrete  windspeed_discrete anomaly_score  anomaly  \\\n",
      "196                 高湿度                 低速风             1       正常   \n",
      "197                 高湿度                 低速风             1       正常   \n",
      "198                 高湿度                 低速风             1       正常   \n",
      "199                 高湿度                 低速风             1       正常   \n",
      "200                 高湿度                 低速风             1       正常   \n",
      "...                 ...                 ...           ...      ...   \n",
      "93756               低湿度                 中速风             1       正常   \n",
      "93757               低湿度                 中速风             1       正常   \n",
      "93777               低湿度                 低速风            -1       异常   \n",
      "93778               低湿度                 低速风            -1       异常   \n",
      "93779               低湿度                 低速风             1       正常   \n",
      "\n",
      "      Temperature (C)_anomaly Humidity_anomaly Wind Speed (km/h)_anomaly  \\\n",
      "196                      True            False                     False   \n",
      "197                      True            False                     False   \n",
      "198                      True            False                     False   \n",
      "199                      True            False                     False   \n",
      "200                      True            False                     False   \n",
      "...                       ...              ...                       ...   \n",
      "93756                    True            False                     False   \n",
      "93757                    True            False                     False   \n",
      "93777                    True            False                     False   \n",
      "93778                    True             True                     False   \n",
      "93779                    True            False                     False   \n",
      "\n",
      "      Apparent Temperature (C)_anomaly Visibility (km)_anomaly  \\\n",
      "196                              False                   False   \n",
      "197                              False                   False   \n",
      "198                               True                   False   \n",
      "199                              False                   False   \n",
      "200                              False                   False   \n",
      "...                                ...                     ...   \n",
      "93756                             True                   False   \n",
      "93757                            False                   False   \n",
      "93777                             True                   False   \n",
      "93778                             True                   False   \n",
      "93779                            False                   False   \n",
      "\n",
      "      Pressure (millibars)_anomaly  \n",
      "196                           True  \n",
      "197                          False  \n",
      "198                          False  \n",
      "199                           True  \n",
      "200                           True  \n",
      "...                            ...  \n",
      "93756                        False  \n",
      "93757                        False  \n",
      "93777                        False  \n",
      "93778                        False  \n",
      "93779                        False  \n",
      "\n",
      "[4748 rows x 28 columns]\n",
      "Humidity 维度异常数据：\n",
      "                  Formatted Date        Summary Precip Type  Temperature (C)  \\\n",
      "298    2006-01-13 10:00:00+00:00          Foggy        snow        -0.816667   \n",
      "931    2006-02-08 19:00:00+00:00          Clear        snow        -0.905556   \n",
      "973    2006-02-10 13:00:00+00:00          Foggy        snow        -0.394444   \n",
      "976    2006-02-10 16:00:00+00:00          Foggy        rain         0.483333   \n",
      "988    2006-02-11 04:00:00+00:00          Foggy        snow        -1.805556   \n",
      "...                          ...            ...         ...              ...   \n",
      "94164  2016-09-28 15:00:00+00:00  Mostly Cloudy        rain        22.266667   \n",
      "94165  2016-09-28 16:00:00+00:00  Partly Cloudy        rain        22.188889   \n",
      "94213  2016-09-30 16:00:00+00:00  Partly Cloudy        rain        25.077778   \n",
      "94439  2016-10-10 02:00:00+00:00  Partly Cloudy        rain         1.727778   \n",
      "94474  2016-10-11 13:00:00+00:00          Foggy        rain         8.133333   \n",
      "\n",
      "       Apparent Temperature (C)  Humidity  Wind Speed (km/h)  \\\n",
      "298                   -0.816667      0.97             3.3971   \n",
      "931                   -5.138889      0.98            13.1859   \n",
      "973                   -0.394444      0.98             1.9159   \n",
      "976                    0.483333      0.97             3.6064   \n",
      "988                   -1.805556      0.98             2.3345   \n",
      "...                         ...       ...                ...   \n",
      "94164                 22.266667      0.33            12.6224   \n",
      "94165                 22.188889      0.33             9.8532   \n",
      "94213                 25.077778      0.33             8.5652   \n",
      "94439                  1.727778      0.97             0.0000   \n",
      "94474                  6.766667      0.98             8.2593   \n",
      "\n",
      "       Wind Bearing (degrees)  Visibility (km)  Pressure (millibars)  ...  \\\n",
      "298                     318.0           1.5778               1038.51  ...   \n",
      "931                     150.0           3.8801               1009.10  ...   \n",
      "973                     221.0           0.9177               1012.23  ...   \n",
      "976                     343.0           1.1592               1012.52  ...   \n",
      "988                     219.0           1.2236               1015.23  ...   \n",
      "...                       ...              ...                   ...  ...   \n",
      "94164                   279.0          16.1000               1026.83  ...   \n",
      "94165                   237.0          15.5526               1027.03  ...   \n",
      "94213                   202.0          15.5526               1017.38  ...   \n",
      "94439                     0.0           3.7191               1023.35  ...   \n",
      "94474                    30.0           2.9141               1012.58  ...   \n",
      "\n",
      "      temperature_discrete  humidity_discrete windspeed_discrete  \\\n",
      "298                   非常低温                高湿度                低速风   \n",
      "931                   非常低温                高湿度                中速风   \n",
      "973                   非常低温                高湿度                低速风   \n",
      "976                     低温                高湿度                低速风   \n",
      "988                   非常低温                高湿度                低速风   \n",
      "...                    ...                ...                ...   \n",
      "94164                   高温                低湿度                中速风   \n",
      "94165                   高温                低湿度                低速风   \n",
      "94213                   高温                低湿度                低速风   \n",
      "94439                   低温                高湿度                NaN   \n",
      "94474                   低温                高湿度                低速风   \n",
      "\n",
      "       anomaly_score anomaly Humidity_anomaly Wind Speed (km/h)_anomaly  \\\n",
      "298                1      正常             True                     False   \n",
      "931                1      正常             True                     False   \n",
      "973                1      正常             True                     False   \n",
      "976                1      正常             True                     False   \n",
      "988                1      正常             True                     False   \n",
      "...              ...     ...              ...                       ...   \n",
      "94164              1      正常             True                     False   \n",
      "94165              1      正常             True                     False   \n",
      "94213              1      正常             True                     False   \n",
      "94439              1      正常             True                      True   \n",
      "94474              1      正常             True                     False   \n",
      "\n",
      "      Apparent Temperature (C)_anomaly Visibility (km)_anomaly  \\\n",
      "298                              False                    True   \n",
      "931                              False                   False   \n",
      "973                              False                    True   \n",
      "976                              False                    True   \n",
      "988                              False                    True   \n",
      "...                                ...                     ...   \n",
      "94164                            False                   False   \n",
      "94165                            False                   False   \n",
      "94213                            False                   False   \n",
      "94439                            False                   False   \n",
      "94474                            False                   False   \n",
      "\n",
      "      Pressure (millibars)_anomaly  \n",
      "298                           True  \n",
      "931                          False  \n",
      "973                          False  \n",
      "976                          False  \n",
      "988                          False  \n",
      "...                            ...  \n",
      "94164                        False  \n",
      "94165                        False  \n",
      "94213                        False  \n",
      "94439                        False  \n",
      "94474                        False  \n",
      "\n",
      "[4223 rows x 27 columns]\n",
      "Wind Speed (km/h) 维度异常数据：\n",
      "                  Formatted Date              Summary Precip Type  \\\n",
      "15     2006-01-01 15:00:00+00:00        Mostly Cloudy        rain   \n",
      "125    2006-01-06 05:00:00+00:00                Foggy        rain   \n",
      "179    2006-01-08 11:00:00+00:00                Clear        rain   \n",
      "212    2006-01-09 20:00:00+00:00        Mostly Cloudy        snow   \n",
      "283    2006-01-12 19:00:00+00:00                Clear        snow   \n",
      "...                          ...                  ...         ...   \n",
      "94315  2016-10-04 22:00:00+00:00  Breezy and Overcast        rain   \n",
      "94368  2016-10-07 03:00:00+00:00             Overcast        rain   \n",
      "94415  2016-10-09 02:00:00+00:00        Partly Cloudy        rain   \n",
      "94431  2016-10-09 18:00:00+00:00        Mostly Cloudy        rain   \n",
      "94437  2016-10-10 00:00:00+00:00        Mostly Cloudy        rain   \n",
      "\n",
      "       Temperature (C)  Apparent Temperature (C)  Humidity  Wind Speed (km/h)  \\\n",
      "15            6.950000                  2.811111      0.74            27.5954   \n",
      "125           2.755556                  2.755556      0.96             0.5152   \n",
      "179           2.083333                  2.083333      0.73             0.6118   \n",
      "212          -1.222222                 -1.222222      0.88             0.1288   \n",
      "283          -0.038889                 -0.038889      0.86             0.5152   \n",
      "...                ...                       ...       ...                ...   \n",
      "94315        10.077778                 10.077778      0.81            29.1249   \n",
      "94368         7.516667                  7.516667      0.93             0.0000   \n",
      "94415         2.105556                  2.105556      0.96             0.0000   \n",
      "94431         7.244444                  7.244444      0.86             0.0000   \n",
      "94437         2.755556                  2.755556      0.96             0.1449   \n",
      "\n",
      "       Wind Bearing (degrees)  Visibility (km)  Pressure (millibars)  ...  \\\n",
      "15                      139.0          11.2056               1010.29  ...   \n",
      "125                      60.0           2.7370               1020.72  ...   \n",
      "179                      86.0           9.9820               1036.24  ...   \n",
      "212                     225.0           9.9820               1033.58  ...   \n",
      "283                     187.0           4.2987               1035.92  ...   \n",
      "...                       ...              ...                   ...  ...   \n",
      "94315                     9.0          14.9569               1017.13  ...   \n",
      "94368                     0.0          15.4077               1016.73  ...   \n",
      "94415                     0.0           9.6761               1021.42  ...   \n",
      "94431                     0.0          16.1000               1022.62  ...   \n",
      "94437                   320.0           9.7566               1023.44  ...   \n",
      "\n",
      "      WindDirection  temperature_discrete humidity_discrete  \\\n",
      "15               东南                    低温               中湿度   \n",
      "125              东北                    低温               高湿度   \n",
      "179              东北                    低温               中湿度   \n",
      "212              西南                  非常低温               高湿度   \n",
      "283               南                  非常低温               高湿度   \n",
      "...             ...                   ...               ...   \n",
      "94315             北                    中温               高湿度   \n",
      "94368             北                    低温               高湿度   \n",
      "94415             北                    低温               高湿度   \n",
      "94431             北                    低温               高湿度   \n",
      "94437            西北                    低温               高湿度   \n",
      "\n",
      "       windspeed_discrete anomaly_score anomaly Wind Speed (km/h)_anomaly  \\\n",
      "15                    高速风             1      正常                      True   \n",
      "125                   低速风             1      正常                      True   \n",
      "179                   低速风             1      正常                      True   \n",
      "212                   低速风             1      正常                      True   \n",
      "283                   低速风             1      正常                      True   \n",
      "...                   ...           ...     ...                       ...   \n",
      "94315                 高速风             1      正常                      True   \n",
      "94368                 NaN             1      正常                      True   \n",
      "94415                 NaN             1      正常                      True   \n",
      "94431                 NaN             1      正常                      True   \n",
      "94437                 低速风             1      正常                      True   \n",
      "\n",
      "      Apparent Temperature (C)_anomaly Visibility (km)_anomaly  \\\n",
      "15                               False                   False   \n",
      "125                              False                   False   \n",
      "179                              False                   False   \n",
      "212                              False                   False   \n",
      "283                              False                   False   \n",
      "...                                ...                     ...   \n",
      "94315                            False                   False   \n",
      "94368                            False                   False   \n",
      "94415                            False                   False   \n",
      "94431                            False                   False   \n",
      "94437                            False                   False   \n",
      "\n",
      "      Pressure (millibars)_anomaly  \n",
      "15                           False  \n",
      "125                          False  \n",
      "179                           True  \n",
      "212                          False  \n",
      "283                           True  \n",
      "...                            ...  \n",
      "94315                        False  \n",
      "94368                        False  \n",
      "94415                        False  \n",
      "94431                        False  \n",
      "94437                        False  \n",
      "\n",
      "[4363 rows x 26 columns]\n",
      "Apparent Temperature (C) 维度异常数据：\n",
      "                  Formatted Date        Summary Precip Type  Temperature (C)  \\\n",
      "169    2006-01-08 01:00:00+00:00          Clear        snow        -4.094444   \n",
      "192    2006-01-09 00:00:00+00:00          Clear        snow        -4.933333   \n",
      "195    2006-01-09 03:00:00+00:00          Clear        snow        -4.555556   \n",
      "335    2006-01-14 23:00:00+00:00          Clear        snow        -3.772222   \n",
      "336    2006-01-15 00:00:00+00:00          Clear        snow        -4.327778   \n",
      "...                          ...            ...         ...              ...   \n",
      "93873  2016-09-16 12:00:00+00:00  Partly Cloudy        rain        28.888889   \n",
      "93874  2016-09-16 13:00:00+00:00  Mostly Cloudy        rain        28.872222   \n",
      "93875  2016-09-16 14:00:00+00:00  Mostly Cloudy        rain        29.927778   \n",
      "93877  2016-09-16 16:00:00+00:00  Mostly Cloudy        rain        28.833333   \n",
      "93897  2016-09-17 12:00:00+00:00  Mostly Cloudy        rain        28.472222   \n",
      "\n",
      "       Apparent Temperature (C)  Humidity  Wind Speed (km/h)  \\\n",
      "169                   -6.794444      0.96             6.1824   \n",
      "192                   -7.811111      0.96             6.2951   \n",
      "195                   -7.344444      0.92             6.2307   \n",
      "335                   -6.655556      0.96             6.7137   \n",
      "336                   -7.322222      0.96             6.7781   \n",
      "...                         ...       ...                ...   \n",
      "93873                 28.533333      0.40            14.1519   \n",
      "93874                 28.427778      0.39            12.0750   \n",
      "93875                 29.155556      0.36            14.2485   \n",
      "93877                 28.227778      0.37            11.8979   \n",
      "93897                 28.061111      0.40            10.0303   \n",
      "\n",
      "       Wind Bearing (degrees)  Visibility (km)  Pressure (millibars)  ...  \\\n",
      "169                     340.0           6.6976               1035.67  ...   \n",
      "192                     350.0           9.8049               1036.26  ...   \n",
      "195                     348.0           7.9695               1035.86  ...   \n",
      "335                      40.0           6.0214               1038.04  ...   \n",
      "336                      49.0           4.0250               1037.94  ...   \n",
      "...                       ...              ...                   ...  ...   \n",
      "93873                   230.0          14.2485               1015.39  ...   \n",
      "93874                   228.0          14.7637               1014.78  ...   \n",
      "93875                   211.0          15.1823               1014.29  ...   \n",
      "93877                   223.0          14.7637               1013.44  ...   \n",
      "93897                   240.0          15.8263               1009.38  ...   \n",
      "\n",
      "      WindSpeedGroup  WindDirection temperature_discrete  humidity_discrete  \\\n",
      "169                低             西北                 非常低温                高湿度   \n",
      "192                低             西北                 非常低温                高湿度   \n",
      "195                低             西北                 非常低温                高湿度   \n",
      "335                低              北                 非常低温                高湿度   \n",
      "336                低             东北                 非常低温                高湿度   \n",
      "...              ...            ...                  ...                ...   \n",
      "93873              中             西南                   高温                低湿度   \n",
      "93874              中             西南                   高温                低湿度   \n",
      "93875              中              南                   高温                低湿度   \n",
      "93877              中              南                   高温                低湿度   \n",
      "93897              中             西南                   高温                低湿度   \n",
      "\n",
      "      windspeed_discrete anomaly_score anomaly  \\\n",
      "169                  低速风             1      正常   \n",
      "192                  低速风             1      正常   \n",
      "195                  低速风             1      正常   \n",
      "335                  低速风             1      正常   \n",
      "336                  低速风             1      正常   \n",
      "...                  ...           ...     ...   \n",
      "93873                中速风             1      正常   \n",
      "93874                中速风             1      正常   \n",
      "93875                中速风             1      正常   \n",
      "93877                中速风             1      正常   \n",
      "93897                中速风             1      正常   \n",
      "\n",
      "      Apparent Temperature (C)_anomaly Visibility (km)_anomaly  \\\n",
      "169                               True                   False   \n",
      "192                               True                   False   \n",
      "195                               True                   False   \n",
      "335                               True                   False   \n",
      "336                               True                   False   \n",
      "...                                ...                     ...   \n",
      "93873                             True                   False   \n",
      "93874                             True                   False   \n",
      "93875                             True                   False   \n",
      "93877                             True                   False   \n",
      "93897                             True                   False   \n",
      "\n",
      "      Pressure (millibars)_anomaly  \n",
      "169                          False  \n",
      "192                           True  \n",
      "195                           True  \n",
      "335                           True  \n",
      "336                           True  \n",
      "...                            ...  \n",
      "93873                        False  \n",
      "93874                        False  \n",
      "93875                        False  \n",
      "93877                        False  \n",
      "93897                        False  \n",
      "\n",
      "[4148 rows x 25 columns]\n",
      "Visibility (km) 维度异常数据：\n",
      "                  Formatted Date        Summary Precip Type  Temperature (C)  \\\n",
      "121    2006-01-06 01:00:00+00:00          Foggy        rain         2.272222   \n",
      "122    2006-01-06 02:00:00+00:00          Foggy        rain         2.827778   \n",
      "123    2006-01-06 03:00:00+00:00          Foggy        rain         2.827778   \n",
      "146    2006-01-07 02:00:00+00:00          Foggy        rain         1.644444   \n",
      "147    2006-01-07 03:00:00+00:00          Foggy        rain         0.527778   \n",
      "...                          ...            ...         ...              ...   \n",
      "95037  2016-11-04 00:00:00+00:00  Partly Cloudy        rain         5.388889   \n",
      "95050  2016-11-04 13:00:00+00:00  Mostly Cloudy        rain        10.855556   \n",
      "95061  2016-11-05 00:00:00+00:00  Partly Cloudy        rain         5.227778   \n",
      "95074  2016-11-05 13:00:00+00:00  Mostly Cloudy        rain        10.644444   \n",
      "95085  2016-11-06 00:00:00+00:00  Partly Cloudy        rain         5.072222   \n",
      "\n",
      "       Apparent Temperature (C)  Humidity  Wind Speed (km/h)  \\\n",
      "121                    0.622222      0.96             5.9570   \n",
      "122                    2.827778      0.96             2.7048   \n",
      "123                    2.827778      0.96             2.4311   \n",
      "146                    1.644444      1.00             3.5742   \n",
      "147                   -2.238889      1.00             8.4525   \n",
      "...                         ...       ...                ...   \n",
      "95037                  3.416667      0.91             8.7584   \n",
      "95050                 10.855556      0.72            13.2020   \n",
      "95061                  3.222222      0.91             8.7906   \n",
      "95074                 10.644444      0.73            13.2825   \n",
      "95085                  3.044444      0.91             8.7584   \n",
      "\n",
      "       Wind Bearing (degrees)  Visibility (km)  Pressure (millibars)  ...  \\\n",
      "121                     337.0           0.4830               1025.32  ...   \n",
      "122                     203.0           1.7549               1020.53  ...   \n",
      "123                     253.0           1.7549               1020.65  ...   \n",
      "146                     346.0           0.2415               1026.86  ...   \n",
      "147                      21.0           0.2898               1027.15  ...   \n",
      "...                       ...              ...                   ...  ...   \n",
      "95037                   170.0          12.3326               1019.79  ...   \n",
      "95050                   177.0          12.3326               1019.40  ...   \n",
      "95061                   171.0          12.2521               1019.74  ...   \n",
      "95074                   177.0          12.2360               1019.34  ...   \n",
      "95085                   171.0          12.1716               1019.68  ...   \n",
      "\n",
      "      Season  WindSpeedGroup WindDirection  temperature_discrete  \\\n",
      "121       冬季               低            西北                    低温   \n",
      "122       冬季               低             南                    低温   \n",
      "123       冬季               低            西南                    低温   \n",
      "146       冬季               低            西北                    低温   \n",
      "147       冬季               低             北                    低温   \n",
      "...      ...             ...           ...                   ...   \n",
      "95037     秋季               低            东南                    低温   \n",
      "95050     秋季               中            东南                    中温   \n",
      "95061     秋季               低            东南                    低温   \n",
      "95074     秋季               中            东南                    中温   \n",
      "95085     秋季               低            东南                    低温   \n",
      "\n",
      "      humidity_discrete windspeed_discrete anomaly_score anomaly  \\\n",
      "121                 高湿度                低速风             1      正常   \n",
      "122                 高湿度                低速风             1      正常   \n",
      "123                 高湿度                低速风             1      正常   \n",
      "146                 高湿度                低速风             1      正常   \n",
      "147                 高湿度                低速风             1      正常   \n",
      "...                 ...                ...           ...     ...   \n",
      "95037               高湿度                低速风             1      正常   \n",
      "95050               中湿度                中速风             1      正常   \n",
      "95061               高湿度                低速风             1      正常   \n",
      "95074               中湿度                中速风             1      正常   \n",
      "95085               高湿度                低速风             1      正常   \n",
      "\n",
      "      Visibility (km)_anomaly Pressure (millibars)_anomaly  \n",
      "121                      True                        False  \n",
      "122                      True                        False  \n",
      "123                      True                        False  \n",
      "146                      True                        False  \n",
      "147                      True                        False  \n",
      "...                       ...                          ...  \n",
      "95037                    True                        False  \n",
      "95050                    True                        False  \n",
      "95061                    True                        False  \n",
      "95074                    True                        False  \n",
      "95085                    True                        False  \n",
      "\n",
      "[3926 rows x 24 columns]\n",
      "Pressure (millibars) 维度异常数据：\n",
      "                  Formatted Date        Summary Precip Type  Temperature (C)  \\\n",
      "83     2006-01-04 11:00:00+00:00       Overcast        rain         2.250000   \n",
      "114    2006-01-05 18:00:00+00:00       Overcast        rain         3.938889   \n",
      "116    2006-01-05 20:00:00+00:00       Overcast        rain         2.827778   \n",
      "167    2006-01-07 23:00:00+00:00       Overcast        snow        -0.044444   \n",
      "168    2006-01-08 00:00:00+00:00       Overcast        snow        -0.044444   \n",
      "...                          ...            ...         ...              ...   \n",
      "91620  2016-06-14 15:00:00+00:00  Partly Cloudy        rain        25.044444   \n",
      "91621  2016-06-14 16:00:00+00:00  Partly Cloudy        rain        25.033333   \n",
      "91622  2016-06-14 17:00:00+00:00  Partly Cloudy        rain        24.972222   \n",
      "91623  2016-06-14 18:00:00+00:00  Partly Cloudy        rain        23.838889   \n",
      "91624  2016-06-14 19:00:00+00:00  Partly Cloudy        rain        21.411111   \n",
      "\n",
      "       Apparent Temperature (C)  Humidity  Wind Speed (km/h)  \\\n",
      "83                    -1.100000      0.92            12.2038   \n",
      "114                    1.922222      0.92             7.9373   \n",
      "116                    0.566667      1.00             8.1305   \n",
      "167                   -0.044444      0.92             3.6547   \n",
      "168                   -0.044444      0.96             3.7030   \n",
      "...                         ...       ...                ...   \n",
      "91620                 25.044444      0.53            17.4685   \n",
      "91621                 25.033333      0.49            19.0302   \n",
      "91622                 24.972222      0.50            17.3397   \n",
      "91623                 23.838889      0.54            15.5043   \n",
      "91624                 21.411111      0.65             7.7924   \n",
      "\n",
      "       Wind Bearing (degrees)  Visibility (km)  Pressure (millibars)  ...  \\\n",
      "83                       13.0          11.1251                  0.00  ...   \n",
      "114                     324.0           6.1985                  0.00  ...   \n",
      "116                     344.0           4.2021                  0.00  ...   \n",
      "167                      20.0           7.8729               1033.55  ...   \n",
      "168                      41.0           8.0500               1033.65  ...   \n",
      "...                       ...              ...                   ...  ...   \n",
      "91620                   290.0          10.2557                999.12  ...   \n",
      "91621                   294.0          15.5526                999.14  ...   \n",
      "91622                   300.0          16.1000                998.92  ...   \n",
      "91623                   300.0          16.1000                999.23  ...   \n",
      "91624                   283.0          15.5526               1000.08  ...   \n",
      "\n",
      "      Month  Season WindSpeedGroup  WindDirection temperature_discrete  \\\n",
      "83        1      冬季              中              北                   低温   \n",
      "114       1      冬季              低             西北                   低温   \n",
      "116       1      冬季              低             西北                   低温   \n",
      "167       1      冬季              低              北                 非常低温   \n",
      "168       1      冬季              低              北                 非常低温   \n",
      "...     ...     ...            ...            ...                  ...   \n",
      "91620     6      夏季              中              西                   高温   \n",
      "91621     6      夏季              中              西                   高温   \n",
      "91622     6      夏季              中              西                   高温   \n",
      "91623     6      夏季              中              西                   高温   \n",
      "91624     6      夏季              低              西                   高温   \n",
      "\n",
      "      humidity_discrete windspeed_discrete anomaly_score anomaly  \\\n",
      "83                  高湿度                中速风            -1      异常   \n",
      "114                 高湿度                低速风            -1      异常   \n",
      "116                 高湿度                低速风            -1      异常   \n",
      "167                 高湿度                低速风             1      正常   \n",
      "168                 高湿度                低速风             1      正常   \n",
      "...                 ...                ...           ...     ...   \n",
      "91620               中湿度                中速风             1      正常   \n",
      "91621               低湿度                中速风             1      正常   \n",
      "91622               低湿度                中速风             1      正常   \n",
      "91623               中湿度                中速风             1      正常   \n",
      "91624               中湿度                低速风             1      正常   \n",
      "\n",
      "      Pressure (millibars)_anomaly  \n",
      "83                            True  \n",
      "114                           True  \n",
      "116                           True  \n",
      "167                           True  \n",
      "168                           True  \n",
      "...                            ...  \n",
      "91620                         True  \n",
      "91621                         True  \n",
      "91622                         True  \n",
      "91623                         True  \n",
      "91624                         True  \n",
      "\n",
      "[3746 rows x 23 columns]\n",
      "处理后的数据：\n",
      "                  Formatted Date        Summary Precip Type  Temperature (C)  \\\n",
      "0      2006-01-01 00:00:00+00:00  Partly Cloudy        rain         0.577778   \n",
      "1      2006-01-01 01:00:00+00:00  Mostly Cloudy        rain         1.161111   \n",
      "2      2006-01-01 02:00:00+00:00  Mostly Cloudy        rain         1.666667   \n",
      "3      2006-01-01 03:00:00+00:00       Overcast        rain         1.711111   \n",
      "4      2006-01-01 04:00:00+00:00  Mostly Cloudy        rain         1.183333   \n",
      "...                          ...            ...         ...              ...   \n",
      "96424  2016-12-31 19:00:00+00:00  Mostly Cloudy        rain         0.488889   \n",
      "96425  2016-12-31 20:00:00+00:00  Mostly Cloudy        rain         0.072222   \n",
      "96426  2016-12-31 21:00:00+00:00  Mostly Cloudy        snow        -0.233333   \n",
      "96427  2016-12-31 22:00:00+00:00  Mostly Cloudy        snow        -0.472222   \n",
      "96428  2016-12-31 23:00:00+00:00  Mostly Cloudy        snow        -0.677778   \n",
      "\n",
      "       Apparent Temperature (C)  Humidity  Wind Speed (km/h)  \\\n",
      "0                     -4.050000      0.89            17.1143   \n",
      "1                     -3.238889      0.85            16.6152   \n",
      "2                     -3.155556      0.82            20.2538   \n",
      "3                     -2.194444      0.82            14.4900   \n",
      "4                     -2.744444      0.86            13.9426   \n",
      "...                         ...       ...                ...   \n",
      "96424                 -2.644444      0.86             9.7566   \n",
      "96425                 -3.050000      0.88             9.4185   \n",
      "96426                 -3.377778      0.89             9.2736   \n",
      "96427                 -3.644444      0.91             9.2414   \n",
      "96428                 -3.888889      0.92             9.2253   \n",
      "\n",
      "       Wind Bearing (degrees)  Visibility (km)  Pressure (millibars)  ...  \\\n",
      "0                       140.0           9.9820               1016.66  ...   \n",
      "1                       139.0           9.9015               1016.15  ...   \n",
      "2                       140.0           9.9015               1015.87  ...   \n",
      "3                       140.0           9.9015               1015.56  ...   \n",
      "4                       134.0           9.9015               1014.98  ...   \n",
      "...                       ...              ...                   ...  ...   \n",
      "96424                   167.0           8.0178               1020.03  ...   \n",
      "96425                   169.0           7.2450               1020.27  ...   \n",
      "96426                   175.0           9.5795               1020.50  ...   \n",
      "96427                   182.0           8.4042               1020.65  ...   \n",
      "96428                   189.0           8.8711               1020.72  ...   \n",
      "\n",
      "      TimeOfDay  Month Season  WindSpeedGroup WindDirection  \\\n",
      "0            凌晨      1     冬季               中            东南   \n",
      "1            凌晨      1     冬季               中            东南   \n",
      "2            凌晨      1     冬季               高            东南   \n",
      "3            凌晨      1     冬季               中            东南   \n",
      "4            凌晨      1     冬季               中             东   \n",
      "...         ...    ...    ...             ...           ...   \n",
      "96424        夜晚     12     冬季               低            东南   \n",
      "96425        夜晚     12     冬季               低            东南   \n",
      "96426        夜晚     12     冬季               低            东南   \n",
      "96427        夜晚     12     冬季               低             南   \n",
      "96428        夜晚     12     冬季               低             南   \n",
      "\n",
      "      temperature_discrete humidity_discrete windspeed_discrete anomaly_score  \\\n",
      "0                       低温               高湿度                中速风             1   \n",
      "1                       低温               高湿度                中速风             1   \n",
      "2                       低温               高湿度                高速风             1   \n",
      "3                       低温               高湿度                中速风             1   \n",
      "4                       低温               高湿度                中速风             1   \n",
      "...                    ...               ...                ...           ...   \n",
      "96424                   低温               高湿度                低速风             1   \n",
      "96425                   低温               高湿度                低速风             1   \n",
      "96426                 非常低温               高湿度                低速风             1   \n",
      "96427                 非常低温               高湿度                低速风             1   \n",
      "96428                 非常低温               高湿度                低速风             1   \n",
      "\n",
      "      anomaly  \n",
      "0          正常  \n",
      "1          正常  \n",
      "2          正常  \n",
      "3          正常  \n",
      "4          正常  \n",
      "...       ...  \n",
      "96424      正常  \n",
      "96425      正常  \n",
      "96426      正常  \n",
      "96427      正常  \n",
      "96428      正常  \n",
      "\n",
      "[71275 rows x 22 columns]\n"
     ]
    }
   ],
   "source": [
    "from sklearn.ensemble import IsolationForest\n",
    "\n",
    "# 初始化数据\n",
    "remaining_data = data.copy()\n",
    "\n",
    "# 逐个维度进行异常检测\n",
    "for feature in features.columns:\n",
    "    # 训练孤立森林模型，仅使用当前特征\n",
    "    feature_values = remaining_data[feature].values.reshape(-1, 1)\n",
    "    isf_feature = IsolationForest(contamination=0.05, random_state=42)\n",
    "    \n",
    "    # 拟合模型并预测异常\n",
    "    feature_anomaly_score = isf_feature.fit_predict(feature_values)\n",
    "\n",
    "    # 标记异常\n",
    "    remaining_data[f'{feature}_anomaly'] = feature_anomaly_score < 0  # 小于0表示异常\n",
    "    \n",
    "    # 打印异常原因并删除异常数据,只打印异常数据\n",
    "    anomalies = remaining_data[remaining_data[f'{feature}_anomaly']]\n",
    "    anomalies.to_csv(f'{feature}_异常数据.csv'.replace(' ','_').replace('/','_'), index=False)\n",
    "    if not anomalies.empty: \n",
    "        print(f\"{feature} 维度异常数据：\")\n",
    "        print(anomalies)\n",
    "        remaining_data = remaining_data[~remaining_data[f'{feature}_anomaly']]  # 删除异常数据\n",
    "    \n",
    "    # 删除标记列\n",
    "    remaining_data.drop(columns=[f'{feature}_anomaly'], inplace=True)\n",
    "\n",
    "# 最终更新的数据\n",
    "print(\"处理后的数据：\")\n",
    "print(remaining_data)\n",
    "remaining_data.to_csv('异常数据处理后.csv', index=False)\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "dm",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.20"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
